Details of the input data

First group of samples (to be referred to as control in the rest of the report)

Sample Names:
GSM2858678
GSM2858686
GSM2858688
GSM2858692
GSM2858694
GSM2858699
GSM2858701
GSM2858705
GSM2858713
GSM2858726
GSM2858727
GSM2858739
GSM2858742
GSM2858746
GSM2858748
GSM2858749
GSM2858761
GSM2858766
GSM2858767
GSM2858780
GSM2858792

Second group of samples (to be referred to as treatment in the rest of the report)

Sample Names:
GSM2858682
GSM2858690
GSM2858707
GSM2858711
GSM2858716
GSM2858718
GSM2858729
GSM2858760
GSM2858772
GSM2858789

Note: A positive log fold change shows higher expression in the treatment group; a negative log fold change represents higher expression in the control group.

Data quality control (QC)

Correlation between samples:

Here we show scatterplots comparing expression levels for all genes between the different samples, for i) all controls, ii) all treatment samples and iii) for all samples together.

These plots will only be produced when the total number of samples to compare within a group is less than or equal to 10.

Correlation between treatment samples:

Replicates within the same group tend to have Pearson correlation coefficients >= 0.96. Lower values may indicate problems with the samples.

Heatmap and clustering showing correlation between replicates

BROWN: higher correlation; YELLOW: lower

Principal Component Analysis

This is a PCA plot of the count values following rlog normalization from the DESeq2 package:

The samples are shown in the 2D plane and distributed by their first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates and batch effects. It is also useful for identifying outlier samples. Control and treatment samples respectively may cluster together.

Visualizing normalization results

These boxplots show the distributions of count data before and after normalization (shown for normalization method DESeq2):

Representation of cpm unfiltered data:

Before normalization:

After normalization:

Gene counts variance distribution

Variance of gene counts across samples are represented. Genes with lower variance than selected threshold (dashed grey line) were filtered out.

Samples differences by all counts normalized:

All counts were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

Percentages of reads per sample mapping to the most highly expressed genes

GSM2858678 GSM2858686 GSM2858688 GSM2858692 GSM2858694 GSM2858699 GSM2858701 GSM2858705 GSM2858713 GSM2858726 GSM2858727 GSM2858739 GSM2858742 GSM2858746 GSM2858748 GSM2858749 GSM2858761 GSM2858766 GSM2858767 GSM2858780 GSM2858792 GSM2858682 GSM2858690 GSM2858707 GSM2858711 GSM2858716 GSM2858718 GSM2858729 GSM2858760 GSM2858772 GSM2858789
100169751 0.014 0.015 0.014 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014
100169760 0.014 0.015 0.014 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014
4535 0.014 0.015 0.014 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014
6085 0.014 0.015 0.014 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014
4536 0.014 0.015 0.014 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.014 0.015 0.015 0.015 0.015 0.015 0.015 0.014 0.015 0.014 0.014 0.014 0.015 0.015 0.015 0.015 0.014 0.015 0.014

DEgenes Hunter results

Gene classification by DEgenes Hunter

DEgenes Hunter uses multiple DE detection packages to analyse all genes in the input count table and labels them accordingly:

  • Filtered out: Genes discarded during the filtering process as showing no or very low expression.
  • Prevalent DEG: Genes considered as differentially expressed (DE) by at least 1 packages, as specified by the minpack_common argument.
  • Possible DEG: Genes considered DE by at least one of the DE detection packages.
  • Not DEG: Genes not considered DE in any package.

This barplot shows the total number of genes passing each stage of analysis - from the total number of genes in the input table of counts, to the genes surviving the expression filter, to the genes detected as DE by one package, to the genes detected by at least 1 packages.

Package DEG detection stats

This is the Venn Diagram of all possible DE genes (DEGs) according to at least on of the DE detection packages employed:

FDR gene-wise benchmarking

Benchmark of false positive calling:

Boxplot of FDR values among all genes with an FDR <= 0.05 in at least one DE detection package

## No Prevalent DEGs found, Bar charts of FDR values for prevalent genes cannot be shown

The complete results of the DEgenes Hunter differential expression analysis can be found in the “hunter_results_table.txt” file in the Common_results folder

DE detection package specific results

Various plots specific to each package are shown below:

DESeq2 normalization effects:

This plot compares the effective library size with raw library size

The effective library size is the factor used by DESeq2 normalizatioin algorithm for eahc sample. The effective library size must be dependent of raw library size.

DESeq2 MA plot:

This is the MA plot from DESeq2 package:

In DESeq2, the MA-plot (log ratio versus abundance) shows the log2 fold changes are attributable to a given variable over the mean of normalized counts. Points will be colored red if the adjusted Pvalue is less than 0.1. Points which fall out of the window are plotted as open triangles pointing either up or down.

A table containing the DESeq2 DEGs is provided: in Results_DESeq2/DEgenes_DESEq2.txt

A table containing the DESeq2 normalized counts is provided in Results_DESeq2/Normalized_counts_DESEq2.txt

Differences between samples by PREVALENT DEGs normalized counts:

Counts of prevalent DEGs were normalizated by DESeq2 algorithm. This count were scaled by log10 and plotted in a heatmap.

## Lower than 2 prevalent differential expression were found

WGCNA Results

WGCNA was run to look for modules (clusters) of coexpressed genes. These modules were then compared with the sample factors to look for correlation. If no sample factors were specified, this comparison was performed with treatment/control labels.

The following graphic shows the power value chosen for building clusters. The power is chosen by looking at the characteristics of the network produced.

In total there were 52 clusters. The following plot shows the number of genes per cluster:

Module Membership distribution

Cluster assignment vs lower module membership (MM)

This plot shows, for each gene, the cluster ID ascertained by WGCNA, vs. the cluster whose eigen gene has the highest correlation (module membership/MM).

Cluster vs. factors correlation

The following plots show the correlation between the different modules and specified factors. This is done using eigengenes, which can be broadly thought of as the average expression pattern for the genes in a given cluster. MEn refers to the eigengene for cluster n.

This plot shows the correlation between clusters (eigen genes) and factors directly.

WGCNA Eigen values clustering

WGCNA dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using signed correlation so more near elements, more positive correlation between elements.

Eigen values clustering (Absolute correlation)

WGCNA like dendogram showing distances between these eigengenes along with the factors. Distances has been calculated using absolute correlation so more near elements, more absolute correlation between elements.

Correlation network between modules and factors

This plot shows modules (black) and factors (green) as nodes. Correlations coefficients over 0.8 (red) and under -0.8 (blue) are represented as edges

Correlation between all clusters and factors

Detailed package results comparation

This is an advanced section in order to compare the output of the packages used to perform data analysis. The data shown here does not necessarilly have any biological implication.

P-value Distributions

Distributions of p-values, unadjusted and adjusted for multiple testing (FDR)

FDR Correlations

Correlations of adjusted p-values, adjusted for multiple testing (FDR) and for log Fold Change.

Values of options used to run DEGenesHunter

First column contains the option names; second column contains the given values for each option in this run.

opt
minpack_common 1
p_val_cutoff 0.05
lfc 1
modules DW
active_modules 1